464 research outputs found

    A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

    Get PDF
    kk Nearest Neighbors (kkNN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based kkNN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an RR-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new kkNN algorithm and its improvements to other version of kkNN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional kkNN algorithm, the proposed manifold version kkNN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

    Folded Polynomial Codes for Coded Distributed AA⊤AA^\top-Type Matrix Multiplication

    Full text link
    In this paper, due to the important value in practical applications, we consider the coded distributed matrix multiplication problem of computing AA⊤AA^\top in a distributed computing system with NN worker nodes and a master node, where the input matrices AA and A⊤A^\top are partitioned into pp-by-mm and mm-by-pp blocks of equal-size sub-matrices respectively. For effective straggler mitigation, we propose a novel computation strategy, named \emph{folded polynomial code}, which is obtained by modifying the entangled polynomial codes. Moreover, we characterize a lower bound on the optimal recovery threshold among all linear computation strategies when the underlying field is real number field, and our folded polynomial codes can achieve this bound in the case of m=1m=1. Compared with all known computation strategies for coded distributed matrix multiplication, our folded polynomial codes outperform them in terms of recovery threshold, download cost and decoding complexity.Comment: 14 pages, 2 tabl

    The Lamb shift in the BTZ spacetime

    Full text link
    We study the Lamb shift of a two-level atom arising from its coupling to the conformal massless scalar field, which satisfies the Dirichlet boundary conditions, in the Hartle-Hawking vacuum in the BTZ spacetime, and find that the Lamb shift in the BTZ spacetime is structurally similar to that of a uniformly accelerated atom near a perfectly reflecting boundary in (2+1)-dimensional flat spacetime. Our results show that the Lamb shift is suppressed in the BTZ spacetime as compared to that in the flat spacetime as long as the transition wavelength of the atom is much larger than AdSAdS radius of the BTZ spacetime while it can be either suppressed or enhanced if the transition wavelength of the atom is much less than AdSAdS radius. In contrast, the Lamb shift is always suppressed very close to the horizon of the BTZ spacetime and remarkably it reduces to that in the flat spacetime as the horizon is approached although the local temperature blows up there.Comment: 21 pages,2 figure

    SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

    Full text link
    Current speech large language models build upon discrete speech representations, which can be categorized into semantic tokens and acoustic tokens. However, existing speech tokens are not specifically designed for speech language modeling. To assess the suitability of speech tokens for building speech language models, we established the first benchmark, SLMTokBench. Our results indicate that neither semantic nor acoustic tokens are ideal for this purpose. Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models. SpeechTokenizer adopts the Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of speech information hierarchically across different RVQ layers. Furthermore, We construct a Unified Speech Language Model (USLM) leveraging SpeechTokenizer. Experiments show that SpeechTokenizer performs comparably to EnCodec in speech reconstruction and demonstrates strong performance on the SLMTokBench benchmark. Also, USLM outperforms VALL-E in zero-shot Text-to-Speech tasks. Code and models are available at https://github.com/ZhangXInFD/SpeechTokenizer/.Comment: SpeechTokenizer project page is https://0nutation.github.io/SpeechTokenizer.github.io

    Classification of C3 and C4 Vegetation Types Using MODIS and ETM+ Blended High Spatio-Temporal Resolution Data

    Get PDF
    The distribution of C3 and C4 vegetation plays an important role in the global carbon cycle and climate change. Knowledge of the distribution of C3 and C4 vegetation at a high spatial resolution over local or regional scales helps us to understand their ecological functions and climate dependencies. In this study, we classified C3 and C4 vegetation at a high resolution for spatially heterogeneous landscapes. First, we generated a high spatial and temporal land surface reflectance dataset by blending MODIS (Moderate Resolution Imaging Spectroradiometer) and ETM+ (Enhanced Thematic Mapper Plus) data. The blended data exhibited a high correlation (R2 = 0.88) with the satellite derived ETM+ data. The time-series NDVI (Normalized Difference Vegetation Index) data were then generated using the blended high spatio-temporal resolution data to capture the phenological differences between the C3 and C4 vegetation. The time-series NDVI revealed that the C3 vegetation turns green earlier in spring than the C4 vegetation, and senesces later in autumn than the C4 vegetation. C4 vegetation has a higher NDVI value than the C3 vegetation during summer time. Based on the distinguished characteristics, the time-series NDVI was used to extract the C3 and C4 classification features. Five features were selected from the 18 classification features according to the ground investigation data, and subsequently used for the C3 and C4 classification. The overall accuracy of the C3 and C4 vegetation classification was 85.75% with a kappa of 0.725 in our study area

    New lower order mixed finite element methods for linear elasticity

    Full text link
    New lower order H(div)H(\textrm{div})-conforming finite elements for symmetric tensors are constructed in arbitrary dimension. The space of shape functions is defined by enriching the symmetric quadratic polynomial space with the (d+1)(d+1)-order normal-normal face bubble space. The reduced counterpart has only d(d+1)2d(d+1)^2 degrees of freedom. In two dimensions, basis functions are explicitly given in terms of barycentric coordinates. Lower order conforming finite element elasticity complexes starting from the Bell element, are developed in two dimensions. These finite elements for symmetric tensors are applied to devise robust mixed finite element methods for the linear elasticity problem, which possess the uniform error estimates with respect to the Lam\'{e} coefficient λ\lambda, and superconvergence for the displacement. Numerical results are provided to verify the theoretical convergence rates.Comment: 23 pages, 2 figure

    Accurate and Efficient Calculation of Three-Dimensional Cost Distance

    Get PDF
    Cost distance is one of the fundamental functions in geographical information systems (GISs). 3D cost distance function makes the analysis of movement in 3D frictions possible. In this paper, we propose an algorithm and efficient data structures to accurately calculate the cost distance in discrete 3D space. Specifically, Dijkstra’s algorithm is used to calculate the least cost between initial voxels and all the other voxels in 3D space. During the calculation, unnecessary bends along the travel path are constantly corrected to retain the accurate least cost. Our results show that the proposed algorithm can generate true Euclidean distance in homogeneous frictions and can provide more accurate least cost in heterogeneous frictions than that provided by several existing methods. Furthermore, the proposed data structures, i.e., a heap combined with a hash table, significantly improve the algorithm’s efficiency. The algorithm and data structures have been verified via several applications including planning the shortest drone delivery path in an urban environment, generating volumetric viewshed, and calculating the minimum hydraulic resistance
    • …
    corecore